For difficult tasks that need planning and careful thinking, basic prompting methods often aren't enough. Yao et al. (2023) and Long (2023) introduced the Tree of Thoughts (ToT) framework, which builds on the idea of "chain-of-thought" reasoning. Instead of following one straight line of reasoning, ToT encourages exploring different possible thought paths to solve problems using language models (LMs).

How Tree of Thoughts Works

In ToT, a "tree of thoughts" is created, where each branch of the tree represents a different step or idea in the problem-solving process. Each thought is a piece of reasoning that helps move closer to the solution. The LM can evaluate these thoughts as it goes along, deciding whether they're helpful or not. ToT uses search methods like breadth-first search (BFS) or depth-first search (DFS) to explore different thought paths, allowing the model to look ahead and backtrack if needed.

For example, if you were solving a puzzle, you wouldn’t just go with the first idea that comes to mind. You’d explore different ideas, see how far each one gets you, and decide whether to keep going or try something else. That’s essentially what ToT does for an LM, letting it "think" in a more strategic way.

Example: The Game of 24

Let’s take the Game of 24 as an example (a task where you need to manipulate numbers to make 24). In the ToT framework, the task is broken down into 3 steps, each representing a different part of the calculation. At each step, the best 5 thought candidates are kept for further evaluation.

When the model is evaluating its progress, it labels each candidate solution as "sure," "maybe," or "impossible." For instance, if a candidate is too far off from 24 (like a result that’s much too high or low), it might be labeled "impossible" and discarded. The model repeats this process, trying different possibilities and narrowing down the best solutions step by step.

Search Strategies in ToT

In both Yao et al. (2023) and Long (2023), tree search methods like DFS, BFS, and beam search are used to explore the thought tree. Long (2023), however, introduces a ToT Controller, which is a system trained using reinforcement learning (RL). This controller helps decide when to backtrack or explore a new path, making it smarter at learning from experience, like AlphaGo learning to play Go by self-play. While the basic search methods are general and work for many problems, Long’s RL-based approach can adapt and improve over time as it learns new strategies.

Simple ToT Prompting

Hulbert (2023) simplified the ToT framework into a prompting method called Tree-of-Thought Prompting. Instead of fully exploring the thought tree, this technique asks the model to think in steps, prompting it to evaluate intermediate ideas. For example:

"Imagine three experts are trying to solve this problem. Each expert will write down one step of their thinking and share it with the group. If any expert realizes they’re wrong, they will stop. The question is…"

This method keeps the model focused on evaluating its reasoning at every step, without needing a complex tree search.

Conclusion

The Tree of Thoughts framework gives language models a structured way to think through complex problems by breaking them down into smaller steps, exploring different options, and self-evaluating. This approach is especially useful for tasks requiring deep reasoning and careful planning, and can even evolve through reinforcement learning.